Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maint: Remove use of folly/hash/Hash.h #1506

Closed
wants to merge 1 commit into from

Conversation

ianthomas23
Copy link
Collaborator

Reference Issues/PRs

Fixes the 5th item of the folly replacement plan, issue #1412.

What does this implement or fix?

This removes the single include of folly/hash/Hash.h. The only explicit use of this is folly::hash::hash_combine which I have replaced with use of boost::hash_combine.

It would alternatively be possible to vendor the 3 or so folly functions, but I have rejected this idea. One of the functions is hash_128_to_64 and is taken from cityhash which is essentially unmaintained for 11 years. I would rather use the actively maintained boost.

Any other comments?

There are some tests of folly::hash::commutative_hash_combine in test_hash.cpp, but this function is not used anywhere else in the ArcticDB code base so I have removed the test code as not being relevant.

@ianthomas23 ianthomas23 mentioned this pull request Apr 18, 2024
17 tasks
Comment on lines +105 to +112
auto seed = hash(id_);
boost::hash_combine(seed, version_id_);
boost::hash_combine(seed, creation_ts_);
boost::hash_combine(seed, content_hash_);
boost::hash_combine(seed, key_type_);
boost::hash_combine(seed, index_start_);
boost::hash_combine(seed, index_end_);
hash_ = seed;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the returned values of get_cached_hash must not change for backward compatibility (e.g. to be able to retrieve segments which have been stored in databases in previous versions).

Is there a way we could test that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's the case, however I agree we should check. I believe we have version backwards and forwards compatibility checks, I'll find out where they are and point you at them

@jjerphan jjerphan changed the title [maint]: Remove use of folly/hash/Hash.h maint: Remove use of folly/hash/Hash.h May 20, 2024
Copy link
Collaborator

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The backwards and forwards compatibility checks are performed by .github/workflows/persistent_storage.yml.

@ianthomas23 ianthomas23 force-pushed the ianthomas23/folly_hash branch 2 times, most recently from c42f0dc to 0484479 Compare June 11, 2024 07:19
@alexowens90
Copy link
Collaborator

I'd like to see some numbers confirming that the performance of the boost implementation is at least comparable to folly's. Should be sufficient to generate a massive vector of AtomKeys and then hash them all. We have code that does similar things already:
https://github.com/man-group/ArcticDB/blob/master/cpp/arcticdb/processing/test/benchmark_clause.cpp
Unfortunately there isn't currently a clean way to compare timings across commits. I'd suggest writing a benchmark on your branch, running it, then backport just the benchmark to master and rerun, and post the result here.

@ianthomas23
Copy link
Collaborator Author

I've benchmarked as suggested using this code

void BM_hash_AtomKey(benchmark::State& state) {
    int n = state.range(0);
    std::vector<AtomKeyImpl> v;

    for (int i = 0; i < n; i++) {
        auto k = AtomKeyBuilder().version_id(i).start_index(i*10).end_index(i*100).build<KeyType::TABLE_INDEX>("");
        v.push_back(k);
    }

    for (auto _ : state) {
        for (auto k : v) {
            benchmark::DoNotOptimize(hash(k));
        }
    }
}

BENCHMARK(BM_hash_AtomKey)->Args({100'000})->Args({1'000'000})->Args({10'000'000});

Results for release build of master (commit cb607222) are:

-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
BM_hash_AtomKey/100000         2.02 ms         2.02 ms          345
BM_hash_AtomKey/1000000        20.4 ms         20.4 ms           34
BM_hash_AtomKey/10000000        203 ms          203 ms            3

and for this branch (commit 21c458c8d) are:

-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
BM_hash_AtomKey/100000         3.33 ms         3.33 ms          210
BM_hash_AtomKey/1000000        33.5 ms         33.5 ms           21
BM_hash_AtomKey/10000000        335 ms          335 ms            2

So this PR's use of boost is at least 50% slower than folly. I am closing this PR, to keep similar performance to folly the relevant folly code will need to be vendored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants